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Abstract 

We address the design of opportunistic spectrum access (OSA) strategies that allow secondary users 
to independently search for and exploit instantaneous spectrum availability. The design objective is to 
maximize the throughput of secondary users while limiting the probability of colliding with primary 
users. Integrated in the joint design are three basic components: a spectrum sensor at the physical (PHY) 
layer that identifies spectrum opportunities, a sensing strategy at the medium access control (MAC) layer 
that determines which channels in the spectrum to sense, and an access strategy, also at the MAC layer, 
that decides whether to access based on sensing outcomes that are subject to errors. 

We formulate the joint PHY-MAC design of OSA as a constrained partially observable Markov 
decision process (POMDP). Constrained POMDPs generally require randomized policies to achieve 
optimality, which are often intractable. By exploiting the rich structure of the underlying problem, we 
establish a separation principle for the joint design of OSA. Specifically, the optimal joint design can 
be carried out in two steps: first to choose the spectrum sensor and the access strategy to maximize 
the instantaneous throughput under a collision constraint, and then to choose the sensing strategy to 
maximize the overall throughput. This separation principle reveals the optimality of myopic policies 
for the design of the spectrum sensor and the access strategy, leading to closed-form optimal solutions. 
Furthermore, decoupling the design of the sensing strategy from that of the spectrum sensor and the 
access strategy, the separation principle reduces the constrained POMDP to an unconstrained one, which 
admits deterministic optimal policies. Numerical examples are provided to study the design tradeoffs, 
the interaction between the PHY layer spectrum sensor and the MAC layer sensing and access strategies, 
and the robustness of the ensuing design to model mismatch. 
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I. Introduction 

The exponential growth in wireless services and the physical limit on usable radio frequencies 
have motivated various dynmaic spectrum sharing strategies, among which is opportunistic 
spectrum access (OS A). OSA, first envisioned by Mitola [1] under the term "spectrum pooling" 
and then investigated by the DARPA XG program [2], has recently received increasing attention 
due to its potential for improving spectrum efficiency [3], [4]. The basic idea of OSA is to allow 
secondary users to search for, identify, and exploit instantaneous spectrum opportunities while 
limiting the level of interference perceived by primary users (or licensees). 

In this paper, we address the design of OSA strategies for secondary users overlaying a slotted 
primary network. Integrated in the OSA design are three basic components: 1) a spectrum sensor 
at the physical (PHY) layer that identifies instantaneous spectrum opportunities; 2) a spectrum 
sensing strategy at the medium access control (MAC) layer that specifies which channels in the 
spectrum to sense in each slot; and 3) a spectrum access strategy, also at the MAC layer, that 
determines whether to access the chosen channels based on imperfect sensing outcomes. The 
design objective is to maximize the throughput of secondary users under the constraint that the 
probability of collision perceived by any primary user is below a pre-determined threshold. 

A. Fundamental Design Tradeoffs 

We provide first an intuitive understanding of the fundamental tradeoffs in the joint design of 
the three basic components. 

Spectrum Sensor: False Alarm vs. Miss Detection The spectrum sensor of a secondary user 
identifies spectrum opportunities by detecting the presence of primary signals, i.e., by performing 
a binary hypothesis test. With noise and fading, sensing errors are inevitable: false alarms occur 
when idle channels are detected as busy, and miss detections occur when busy channels are 
detected as idle. In the event of a false alarm, a spectrum opportunity is overlooked by the 
sensor, and eventually wasted if the access strategy trusts the sensing outcome. On the other 
hand, miss detections may lead to collisions with primary users. The tradeoff between false alarm 
and miss detection is captured by the receiver operating characteristic (ROC) of the spectrum 
sensor, which relates the probability of detection (PD) and the probability of false alarm (PFA) 
(see an example in Fig. \T\ where we consider an energy detector). The design of the spectrum 
sensor and the choice of the sensor operating point are thus important issues and should be 
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Probability of False Alarm e 

Fig. 1. The ROC of an energy detector. Each point on the ROC curve corresponds to a sensor operating characteristic resulting 
from different detection tiireshold of the energy detector, (e: probability of false alarm, S: probability of miss detection.) 

addressed by considering the impact of sensing errors on the MAC layer performance in terms of 
throughput and collision probability. In particular, we are interested in the fundamental question 
that which criterion should be adopted in the design of the spectrum sensor, the Bayes or the 
Neyman-Pearson (NP). If the former, how do we choose the risks? If the latter, how should we 
set the constraint on the PFA? 

Sensing Strategy: Gaining Immediate Access vs. Gaining Information for Future Use Due to 
hardware limitations and the energy cost of spectrum monitoring, a secondary user may not be 
able to sense all the channels in the spectrum simultaneously. A sensing strategy is thus needed 
for intelligent channel selection to track the rapidly varying spectrum opportunities. The purpose 
of a sensing strategy is twofold: to find idle channels for immediate access and to gain statistical 
information on the spectrum occupancy for better opportunity tracking in the future. The optimal 
sensing strategy should thus strike a balance between these two often conflicting objectives. 

Access Strategy: Aggressive vs. Conservative Based on the imperfect sensing outcomes given 
by the spectrum sensor, the secondary user needs to decide whether to access. An aggressive 
access strategy may lead to excessive collisions with primary users while a conservative one may 
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result in throughput degradation due to overlooked opportunities. Whether to adopt an aggressive 
or a conservative access strategy depends on the operating characteristic (false alarm vs. miss 
detection) of the spectrum sensor and the collision constraint at the MAC layer. Hence, a joint 
design of the PHY layer spectrum sensor and the MAC layer access strategy is necessary for 
optimality. 

B. Main Results 

By modeling primary users' spectrum occupancy as a Markov process, we establish a decision- 
theoretic framework for the optimal joint design of OS A based on the theory of partially 
observable Markov decision processes (POMDPs). This framework captures the fundamental 
design tradeoffs discussed above. Within this framework, the optimal OSA strategy is given by 
the optimal policy of a constrained POMDP. 

While powerful in problem modeling, POMDP suffers from the curse of dimensionality and 
does not easily lend itself to tractable solutions. Constraints on a POMDP further complicates 
the problem, often demanding randomized policies to achieve optimality. Our goal is to develop 
structural results that lead to simple yet optimal solutions and shed light on the interaction 
between the PHY and the MAC layers of OSA networks. 

Single-Channel Sensing We focus first on the case where the secondary user can sense and 
access one channel in each slot {e.g., in the case of single carrier communications). We establish 
a separation principle for the optimal joint design of OSA. We show that the joint design can 
be carried out in two steps without losing optimality: first to choose a spectrum sensor and an 
access strategy that maximize the instantaneous throughput {i.e., the expected number of bits that 
can be delivered in the current slot) under the collision constraint, and then to choose a sensing 
strategy to optimize the overall throughput. As stated below, the significance of this separation 
principle is twofold. 

• The separation principle reveals the optimality of myopic policies for the design of the 
spectrum sensor and the access strategy. Myopic policies that aim solely at maximizing the 
immediate reward ignore the impact of the current actions on the future reward. Hence, 
obtaining myopic policies becomes a static optimization problem instead of a sequential 
decision-making problem. While myopic policies are rarely optimal for a general POMDP, 
we show that the rich structure of the problem at hand renders an exception. As a con- 
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sequence, we are able to obtain an explicit design of the optimum spectrum sensor and a 
closed-form optimal access strategy. Moreover, this closed-form optimal design allows us 
to characterize quantitatively the interaction between the PHY layer spectrum sensor and 
the MAC layer access strategy. 
• The separation principle decouples the design of the sensing strategy from that of the 
spectrum sensor and the access strategy. More importantly, the design of the sensing strat- 
egy is reduced to an unconstrained POMDP, which admits deterministic optimal policies. 
Unconstrained POMDPs have been well studied, and existing algorithms can be readily 
applied [5]-[8]. 

We also provide simulation examples to study design tradeoffs. We will see that miss detections 
are more harmful to the throughput of the secondary user than false alarms. The tradeoff 
study between the spectrum sensing time and the data transmission time indicates that the 
spectrum sensor should take fewer channel measurements as the maximum allowable probability 
of collision increases. In other words, when the collision constraint is less restrictive, the 
secondary user can spend less time in sensing, leaving more time in a slot for data transmission. 
Robustness studies show that the throughput loss due to inaccuracies in the assumed Markovian 
model parameters is small, and more importantly, the probability of collision perceived by the 
primary network is not affected by model mismatch. 

Multi-Channel Sensing We then consider the scenario where the secondary user can sense 
and access multiple channels simultaneously in each slot. We show that the separation principle 
still holds if the spectrum sensor and the access strategy are designed independently across 
channels. We note that such independent design is suboptimal since it ignores the potential 
correlation among channel occupancies. We thus propose two heuristic approaches to exploit 
channel correlation, one at the PHY layer and the other at the MAC layer. Simulation results 
show that exploiting channel correlation at the PHY layer is more effective than at the MAC 
layer. 

We also find that the performance of the PHY layer spectrum sensor can improve over time by 
incorporating the MAC layer sensing and access decisions. Such MAC layer decisions provide 
information on the evolution of the primary users' spectrum occupancy, from which the a priori 
probabilities of the hypotheses employed by the spectrum sensor can be learned. This finding, 
along with the quantitative characterization of the impact of the spectrum sensor on the access 
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Strategy, illustrates the two-way interaction between the PHY and the MAC layers: the necessity 
of incorporating the sensor operating characteristics into the MAC design and the benefit of 
exploiting the MAC layer information in the PHY design. 

C. Related Work 

Two types of spectrum opportunities have been considered in the literature: spatial and tem- 
poral. A majority of existing work on OSA focuses on exploiting spatial spectrum opportunities 
that are static or slowly varying in time (see [9]-[ll] and references therein). A typical example 
application is the reuse of locally unused TV broadcast bands. In this context, due to the slow 
temporal variation of spectrum occupancy, realtime opportunity identification is not as critical 
a component as in applications that exploit temporal spectrum opportunities, and the existing 
work often assumes perfect knowledge of spectrum opportunities in the whole spectrum at any 
location. 

The exploitation of temporal spectrum opportunities resulting from the bursty traffic of primary 
users is addressed in [12]-[15] under the assumption of perfect sensing. In [12], MAC protocols 
are proposed for an ad hoc secondary network overlaying a GSM cellular network. It is assumed 
that the secondary transmitter and receiver exchange information on which channel to use through 
a commonly agreed control channel. Different from this work, optimal distributed MAC protocols 
developed in [13] can synchronize the hopping patterns of the secondary transmitter and receiver 
without the aid of additional control channels. More recently, the design of optimal spectrum 
sensing and access strategies in a fading environment is addressed under an energy constraint 
in [14]. In [15], access strategies for a slotted secondary user searching for opportunities in an 
un-slotted primary network is considered, where a round-robin single-channel sensing scheme 
is used. Modeling of spectrum occupancy has been addressed in [16]. Measurements obtained 
from spectrum monitoring test-beds demonstrate the Makovian transition between busy and idle 
channel states in wireless LAN. 

Although the issue of spectrum sensing errors has been investigated at the PHY layer [17]- 
[21], cognitive MAC design in the presence of sensing errors has received little attention. To 
the best of our knowledge, [22] is the first work that integrates the operating characteristic of 
the spectrum sensor at the PHY layer with the MAC design. A heuristic approach to the joint 
PHY-MAC design of OSA is proposed in [22]. In this paper, we establish a decision-theoretic 
framework within which the optimal joint design of OSA in the presence of sensing errors can 
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be systematically addressed and the interaction between the PHY and the MAC layers can be 
quantitatively characterized. Interestingly, the separation principle developed in this paper reveals 
that the heuristic approach proposed in [22] is optimal. 

For an overview on challenges and recent developments in OS A, readers are referred to [23]. 

D. Organization 

This paper is organized as follows. Section II describes the network model and the basic 
operations performed by a secondary user to exploit spectrum opportunities. In Section III, we 
introduce the three basic components of OSA and formulate their joint design as a constrained 
POMDP. In Section IV, we establish the separation principle for the optimal joint design of OSA 
with single-channel sensing. Section V extends the separation principle to multi-channel sensing 
scenarios. Section VI concludes this paper. 

II. Network Model 

Consider a spectrum that consists of channels (e.g., different frequency bands or tones in 
an OFDM system), each with bandwidth (n = 1, ■ ■ ■ ,N). These channels are licensed 
to a slotted primary network. We model the spectrum occupancy as a discrete-time homoge- 
nous Markov process with 2^ states. Specifically, let S'„(t) G {0 (busy), 1 (idle)} denote the 
occupancy of channel n in slot t. The spectrum occupancy state (SOS) S(t) = [5'i(t), . . . , S'Ar(t)] 
follows a discrete Markov process with finite state space § = {0, 1}^. The transition probabilities 
are denoted as {Pss'jses, where Pgs' = Pr{S(t) = s' | Sit — 1) = s} is the probability that the 

s'gS 

SOS transits from s G S to s' G S at the beginning of slot t. Note that the transition probabilities 
are determined by the dynamics of the primary traffic. We assume that they are known and 
remain unchanged in T slots. 

We consider a secondary ad hoc network whose users independently and selfishly exploit 
instantaneous spectrum opportunities in these A^ channeli^- At the beginning of each slot, a 
secondary user with data to transmit chooses a set of channels to sense. A spectrum sensor {e.g., 
an energy detector) is used to detect the states of the chosen channels. Based on the sensing 
outcomes, the secondary user decides which sensed channels to access. Due to hardware and 

'We assume that the inter-channel interference is negligible. Thus, a secondary user transmitting over an idle channel does 
not interfere with primary users transmitting over other channels. 
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energy constraints, we assume that a secondary user can sense and access at most L (1 < L < A^) 
channels in a slot. At the end of the slot, the receiver acknowledges a successful transmission. 
The basic slot structure is illustrated in Fig. [2l 

Our goal is to develop an optimal OSA strategy for the secondary user, which sequentially 
determines which channels in the spectrum to sense, how to design the spectrum sensor, and 
whether to access based on the imperfect sensing outcomes. The design objective is to maximize 
the throughput of the secondary user during a desired period of T slots under the constraint that 
the probability of collision P„(t) perceived by the primary network in any channel n and slot t 
is capped below a pre-determined threshold ^, i.e., 

P^{t) ^ Pr{$„(t) = 1 I Sn{t) = 0} < C, Vn, t, (1) 

where $„(t) G {0 (no access), 1 (access)} denotes the access decision of the secondary user. 
Remarks: 

1) We assume that the transition probabilities of the SOS are known or have been learned. In 
Section ITV-Fi we study the robustness of the optimal OSA design to a mismatched Markov 
model. For the case where the SOS dynamics are unknown, formulations and algorithms 
for POMDP with an unknown model exist in the literature [24] and can be applied to this 
problem, but is beyond the scope of this paper. 

2) We use the conditional probability of collision P„(t) in the design constraint and impose the 
collision constraint on any channel n and slot t. This ensures that a primary user experiences 
collisions no more than ^ x 100% of its transmission time regardless of where and when it 
transmits. Note that if the unconditional probability of collision Pr{$„(t) = 1, Sn{t) = 0} 
is adopted, the constraint depends on the traffic load of primary users in channels chosen 
by the secondary users; primary users who have light traffic load may not be as well 
protected as those with heavy traffic load. 
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3) We assume that secondary users exploit spectrum opportunities independently and selfishly. 
That is, secondary users do not exchange their information on the SOS and everyone aims 
to maximize its own throughput without taking into consideration the interactions among 
secondary users. This assumption is suitable for secondary ad hoc networks where there is 
no central coordinator or dedicated control/communication channel. The secondary network 
can adopt a carrier sensing mechanism to avoid collisions among competing secondary 
users as detailed in [13], [22]. We point out that such selfish decisions may not be optimal 
in terms of network-level throughput. Nevertheless, this formulation allows us to focus on 
the basic components of OSA and highlight the interaction among them. 

III. Constrained POMDP Formulation 

Integrated in the optimal design of OSA are three basic components: a spectrum sensor, 
a sensing strategy, and an access strategy. In this section, we develop a decision-theoretic 
framework for the optimal joint design based on the theory of POMDP. We focus first on 
the single-channel sensing case where the secondary user can only sense and access one channel 
in each slot (L = 1). Extensions to multi-channel sensing scenarios are detailed in Section W\ 

A. Spectrum Sensor 

Suppose that channel n is chosen in slot t. The spectrum sensor detects the presence of primary 
users in this channel by performing a binary hypothesis test: 

Ho : Sn{t) = 1 (idle) 

(2) 

vs. Hi : Snit) = (busy). 

Let 0„(t) G {0 (busy), 1 (idle)} denote the sensing outcome (i.e., the result of the binary 
hypothesis test). The performance of the spectrum sensor is characterized by the PFA e„(t) 
and the probability of miss detection (PM) (5„(t): 

e„(t) = Pr{decide Hi \ Ho is true} = Pr{e„(t) = | 5„(t) = 1}, (3a) 
6n{t) = Pr{decide Ho \ Hi is true} = Pr{e„(t) = 1 1 Sn{t) = 0}. (3b) 

For a given PFA e„(t), the largest achievable PD, denoted as PDmaxi^nit)), can be attained by 
the optimal NP detector with the constraint that the PFA is no larger than e„(t) or an optimal 
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Bayesian detector with a suitable set of risks [25, Sec. 2.2.1]. All operating points (e, 5) above 

in) 

the best ROC curve are thus infeasible. 

Let Ki{n) = {{t,5) : < e < 1 — 5 < PDln&A^)} denote all feasible operating points of 
the spectrum sensoi^. As illustrated in Fig. [3l the best ROC curve -P^"niax achieved by the 
optimal NP detector forms the upper boundary of the feasible set As{n). We also note that 
every sensor operating point (e„, 5„) below the best ROC curve lies on a line that connects two 
boundary points and hence can be achieved by randomizing between two optimal NP detectors 
with properly chosen constraints on the PFA [25, Sec. 2.2.2]. For example, the operating point 
(en, 5n) as shown in Fig. |3] can be achieved by applying the optimal NP detector under the 



constraint of PFA < with probability p = ^^2) and the optimal NP detector under the 

(2) 

constraint of PFA < Cn with probability 1 — p. Therefore, the design of spectrum sensor is 
reduced to the choice of a desired sensor operating point in As{n). 



.(2) 




Probability of False Alarm e 

Fig. 3. Illustration of the set As{n) of all feasible sensor operating points (en,(5„). (S^^ = 1 — firLax(^"')' * = 1)2) 

The design of the optimal NP detector is a well-studied classic problem, which is not the 
focus of this paper. Our objective is to define the criterion and the constraint under which the 

^Since the two hypotheses in ^ play a symmetric role, we have assumed, without loss of generality, that the PD is no smaller 
than the PFA, i.e., 1 - S > e. 
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spectrum sensor should be designed, equivalently, to find the optimal sensor operating point 
(e* (t), G Ks{n) to achieve the best tradeoff between false alarm and miss detection. Note 
that the optimal sensor operating point may vary with time (see Section IV-DI for an example.) 

As discussed in Section HI if the secondary user completely trusts the sensing outcomes 
in making access decisions, false alarms result in wasted spectrum opportunities while miss 
detections lead to collisions with primary users. To optimize the performance of the secondary 
user while limiting its interference to the primary network, we need to carefully design the 
spectrum sensor by considering its impact on the MAC layer performance in terms of throughput 
and collision probability. Further, the spectrum access decisions should be made by taking into 
account the sensor operating characteristics. A joint design of the PHY layer spectrum sensor 
and the MAC layer access strategy is thus necessary to achieve optimality. 

B. Sensing and Access Strategies 

In each slot, a sensing strategy decides which channel in the spectrum to sense, and an 
access strategy determines whether to access given the sensing outcomqj. Below we illustrate 
the sequence of operations in each slot. 

At the beginning of slot t, the SOS transits to S(t) = [S'i(t), . . . , S'Ar(t)] according to the 
transition probabilities of the underlying Markov process. The secondary user first chooses a 
channel a{t) G = {1, . . . , N} to sense and a feasible sensor operating point (ea(t), 5a{t)) G 
Ks{a{t)). It then determines whether to access $a(^) G {0 (no access), 1 (access)} by taking 
into account the sensing outcome Qa{t) ^ {0 (busy), 1 (idle)} provided by the spectrum sensor 
that is designed according to the chosen operating point {ea{t), 5a{t)). A collision with primary 
users happens when the secondary user accesses a busy channel. At the end of this slot, the 
receiver acknowledges a successful transmission Ka{t) G {0 (no ACK), 1 (ACK)}. We assume 
that the ACK is error-freec. 

^An alternative formulation of the joint design is to combine the spectrum sensor with the access strategy. In this case, the 
access decision is made directly based on the channel measurements. It can be readily shown that this formulation is equivalent 
to the one adopted here. 

''Note that the ACK is sent after the success reception of data. Hence, the channel over which the ACK is transmitted is 
ensured to be idle in this slot. 
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C. Constrained POMDP Formulation 

We show here that the joint design of OS A can be formulated as a constrained POMDP with 
states, actions, transition probabiUties, observations, and reward structure defined as follows. 

State Space The system state is given by the SOS of the primary network. The state space is 
thus § = (0, 1}^. 

Action Space In each slot t, the secondary user needs to decide which channel to sense, 
which sensor operating point to choose, and whether to access. Hence, the action in the POMDP 
formulation consists of three parts: a sensing decision a{t) e A^, a spectrum sensor design 
{ea{t) , Sa{t)) e As{a{t)), and an access decision $a(0 e {0, 1}. 

Transition Probabilities The transition probabilities of the SOS are given by {-Ps,s'}> which 
are determined by the primary traffic. 

Observation Space As will become clear later, optimal channel selection for opportunity 
tracking relies on the exploitation of the statistical information on the SOS provided by the 
observation history of the secondary users. To ensure synchronous hopping in the spectrum 
without introducing extra control message exchange, the secondary user and its desired receiver 
must have the same history of observations so that they make the same channel selection 
decisions. Since sensing errors may cause different sensing outcomes at the transmitter and 
the receiver, the acknowledgement Ka{t) e {0, 1} should be used as the common observation 
in each slot. 

Reward A nature definition of the reward is the number of bits that can be delivered by the 
secondary user, which is assumed to be proportional to the channel bandwidth. Given sensing 
action a{t) and access action $a(t), the immediate reward Rxait) can be defined as 

RK^(t) = Ka{t)Ba = Sama{t)Ba. (4) 

Hence, the expected total reward of the POMDP represents overall throughput, the expected total 
number of bits that can be delivered by the secondary user in T slots. 

Belief Vector Due to partial spectrum monitoring and sensing errors, a secondary user cannot 
directly observe the true SOS. It can, however, infer the SOS from its decision and observation 
history. As shown in [5], the statistical information on the SOS provided by the entire decision 
and observation history can be encapsulated in a belief vector A(t) = {As(t)}s6S G n(§), where 
\s{t) e [0, 1] denotes the conditional probability (given the decision and observation history) 
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that the SOS is s e S at the beginning of slot t prior to the state transition, and 



n(§)^ hAJ,es:A,e[0,l],^As=l 



(5) 



denotes the behef space which includes all possible probability mass functions (PMF) on the 

state space §. Given belief vector A(t), the distribution of the system state S{t) in slot t after 
the state transition is then given by 



Policy A joint design of OS A is given by policies of the above POMDP. Specifically, a sensing 
policy TTs specifies a sequence of functions, each mapping a belief vector \{t) e n(S) at the 
beginning of slot i to a channel a{t) G to be sensed in this slot: tTs = . . . , iis{T)], 

where /tis(^) : n(§) A^. Since the optimal policy for a finite-horizon POMDP is generally 
non- stationary, functions {fisit)}f^^ are not identical. Similarly, a sensor operating policy ws 
specifies, in each slot t, a spectrum sensor design (eo(t), 5a{t)) e As{a{t)) based on the current 
belief vector A{t) and the chosen channel a{t). An access policy tTc specifies an access decision 
^a{t) e {0, 1} in each slot t based on the current belief vector A{t) and the sensing outcome 



The above defined policies are deterministic. For unconstrained POMDPs, there always exist 
deterministic optimal policies. For constrained POMDPs, however, we may need to resort to 
randomized policies to achieve optimality. A randomized sensing policy tt^ defines a sequence 
of functions, each mapping a belief vector A{t) to a PMF on the set of channels, and a 
randomized sensor operating policy ns defines the mapping from A{t) to a probability density 
function (PDF) on the set As{a{t)) of feasible sensor operating points. A randomized access 
policy TTc maps A(t) and sensing outcome Ga(t) to a transmission probability in each slot t. In 
other words, the actions chosen in a randomized policy are probability distributions. Due to the 
uncountable space of probability distributions, randomized policies are usually computationally 
prohibitive. 

Objective and Constraint We aim to develop the optimal joint design of OS A {tt^, tt*, tt*} 
that maximizes the expected total number of bits that can be delivered by the secondary user 
(i.e., the expected total reward of the POMDP) in T slots under the collision constraint given in 




Vs e §. 



(6) 



Qait) e {0, 1}. 
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= arg max E{^,_^^,^^} 

s.t. P„(t) = Pr{<l>„(t) = 1 I Sa{t) = 0} < C, Va, t, 
where E|^^ ,r3,7rc} represents the expectation given that policies {tt^, vr^, tTc} are employed, Pa{t) 
is the probability of collision perceived by the primary network in chosen channel a{t) and slot 
t, and A(l) is the initial belief vector, which can be set to the stationary distribution of the 
underlying Markov process if no information on the initial SOS is available. 

We consider in (|7]) the non-trivial case where the conditional collision probability Pa{t) is 
well-defined, i.e., Vi{Sa{t) = 0} > 0. Note that Vi{Sa{t) = 0} = (or 1) implies that the 
system state Sa{t) is known based on the current belief vector A(t). In this case, the optimal 
access decision is straightforward, and the design of the spectrum sensor becomes unnecessary 
since the channel state is already known. 

IV. Separation Principle for Optimal OSA 

In this section, we solve the constrained POMDP given in (|7]) to obtain the optimal joint 
design of OSA. Specifically, we establish a separation principle that reveals the optimality of 
deterministic policies and leads to closed-form optimal design of the spectrum sensor and the 
access strategy. It also allows us to characterize quantitatively the interaction between the PHY 
layer sensor operating characteristics and the MAC layer access strategy. 

A. Optimality Equation 

The first step to solving (|7]) is to express the objective and the constraint explicitly as functions 
of the actions. We establish first the optimality of deterministic sensing and sensor operating 
policies, which significantly simplifies the action space. 

Optimality of deterministic policies In Proposition [H we show that it is sufficient to consider 
deterministic sensing and sensor operating policies in the optimal joint design of OSA. 

Proposition 1: For the optimal joint design of OSA given by (|7]), there exist deterministic 
optimal sensing and sensor operating policies. 

Proof: The proof is based on the concavity of the best ROC curve and the fact that the 
collision constraint is imposed on every channel. See details in Appendix A. I I I I 



t=i 



R 



Ka{t) 



A(l) 



(7) 
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As a result of Proposition [H the secondary user needs to choose, in each slo|f[ a channel 
a G As to sense, a feasible sensor operating point (ea,5a) G A5(a), and a pair of transmission 
probabilities (/a(0), /«(!)), where 

/,(^)=Pr{$„ = l|e„ = ^^}G[0,l] 

is the probability of accessing channel a given sensing outcome 6a = 6'G{0,l}. The composite 
action space is then given by 

A ^ {(a, (e,,(5J,(/,(0), /,(!))) :aGA„(e,,5J GA5(a),(/,(0), /,(!)) G [0, l]^}. (8) 

Objective function Let V((A(t)) be the value function, which represents the maximum expected 
reward that can be obtained starting from slot t (1 < t < T) given belief vector X{t) at the 
beginning of slot t. Given that the secondary user takes action A = {a, (ea, 5a), (/a(0), /„(!))} G 
A and observes acknowledgement Ka = k, the reward that can be accumulated starting from 
slot t consists of two parts: the immediate reward R^^ = kBa and the maximum expected future 
reward Vt+i{A{t + 1)), where 

A(t + 1) = {As(t + l)}ses = r(A(t) I A, k) 

represents the updated knowledge of the SOS after incorporating the action A and the acknowl- 
edgement k in slot t. Averaging over all possible states s G § and acknowledgements k G {0, 1} 

and maximizing over all actions A G A, we arrive at the following optimality equation 

1 

Vt{A{t)) = max V V As'(t)Ps',s V U^A^) [kB^ + Vt+i{T{\{t) \ A, k))] , l<t<T, 

sgS s'gS fc=0 

(9a) 

Vt{A{T)) = max5^5^As'(t)Ps',sf/s,i(A)5a, (9b) 

where ^g/gg K'(t)Ps',s is the distribution of the SOS in slot t (see and Ug^A) = Pr{-ft'a = 
I S = s} is the conditional distribution of the acknowledgement given current state s and 
action A. Since Ka = Sa^a, the conditional distribution ?7s,fc(^) of the acknowledgement can 

'Time index t will be omitted for notation convenience. 
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be calculated as 

= Vi{Ka = 1 I S = s} = Vi{Sa = 1 1 S = s} Pr{<l>, = 1 1 S = s, S', = 1} 

1 

= 5^Pr{e„ = I S = s}UO) = sa[ea/a(0) + (1 - ea)/a(l)], (10a) 

61=0 

f/,,o(A) = l-f/s,i(A), (10b) 

where is the indicator function and Pr{S'a = 1 1 S = s} = '^[s^=i\ is given by the occupancy 
state Sa of channel a. Applying Bayes' rule, we obtain the updated belief vector A(t + 1) = 
r(A(t) I A, k) as 

We see from (fTTI) that by adopting the acknowledgement Ka as their observation, the transmitter 
and the receiver will have the same updated belief vector A(t+ 1), which ensures that they tune 
to the same channel in the next slot. 

Note from ^ that the action A = {a, (e^, ^a), (/a(0), /a(l))} taken by the secondary user 
affects the expected total reward in two ways: it acquires an immediate reward Rk^ = kBa 
and transforms the current belief vector A(t) to a new one A(t + 1) = T(A(t) | A, k) which 
determines the future reward Vt+i{T{\(t) \ A, k)). Hence, the function of the secondary user's 
action is twofold: to exploit immediate spectrum opportunities and to gain information on the 
SOS (characterized by belief vector A(t + 1)) so that more rewarding decisions can be made 
in the future. As a consequence, the optimal joint design of OSA should achieve the tradeoff 
between these two often conflicting objectives. Myopic policies that aim solely at maximizing 
the instantaneous throughput {i.e., the expected immediate reward) without considering future 
consequences are generally suboptimal. 

Collision Constraint The collision probability Pa{t) is determined by the sensor operating 
point (ea,(5a) and the transmission probabilities (/^(O), /„(!)): 

P„(t)= Pr{$,(t) = l|5a(t) = 0} 



Pr{ea = I 5, = 0} Pr{<l>„ = 1 1 = 0, 5, = 0} 



e=o 



:i-5a)/a(0) + 5a/a(l) <C- (12) 
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In principle, by solving ^ recursively (starting from the last slot T using (l9bl) ) under the 
constraint of (fT2l) . we can obtain the maximum overall throughput Vi(A(l)) of the secondary 
user and the corresponding policies {vr*, 7r|, vr*}. However, dH) is generally intractable due to the 
uncountable action space A. 



B. The Separation Principle 

Theorem 1: The Separation Principle for OSA with Single-Channel Sensing 

The joint design of OSA given in ^ can be carried out in two steps without losing optimality. 

• Step 1: Choose the sensor operating policy ns and the access policy ttc to maximize 
the instantaneous throughput subject to the collision constraint. Specifically, for any cho- 
sen channel a, the optimal sensor operating point (e*, 5*) and transmission probabilities 
(/:(0), /:(!)) are given by 

{(C 5:), (/:(0), /:(!))} = arg max E | A(t)] 

(ea,'5a)eAi(a) 

(/a(0),/a(l))e[0,l]2 

= arg max tafa{^) + (1 - ea)/a(l) (13a) 

(ea,'5a)eAj(a) 

(/a(0),/a(l))e[0,l]2 

S.t. Pa(t) = (l-5a)/a(0)+5a/a(l)<C (13b) 

• Step 2: Using the optimal sensor operating and access policies {vr|,7r*} given by (fT3l) . 
choose sensing policy to maximize the overall throughput. Specifically, the optimal sensing 
policy vr* is given by 



vr* = arg max E^^ 



Ka{t) 



t=l 



A(l) 



(14) 



Proof: The proof is based on the convexity of the value function V((A(t)) with respect to 
the belief vector A(t) and the structure of the conditional observation distributions f/s,fe(A). See 
Appendix B for details. I I I I 

The separation principle simplifies the optimal joint design of OSA in two ways. First, it 
reveals that myopic policies, rarely optimal for a general POMDP, are optimal for the design 
of the spectrum sensor and the access strategy. We can thus obtain the optimal spectrum sensor 
(e*,5*) G ks{a) and the optimal transmission probabilities (/a (0), /*(1)) G [0,1]^ by solving 
a static optimization problem given in (fT3l) . This allows us to characterize quantitatively the 
interaction between the spectrum sensor and the access strategy as given Proposition |2] and to 
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obtain the optimal joint design in closed-form as given in Theorem [2l While the proof is lengthy, 
there is an intuitive explanation for this apparently surprising. We note that upon receiving the 
ACK Ka = 1, the secondary user knows exactly that the chosen channel is idle. However, when 
Ka = (no packet is received), the secondary receiver cannot tell whether the chosen channel 
is busy or not accessed. Hence, Ka = 1 provides the secondary user with more information 
on the current SOS. We also note that accessing the chosen channel maximizes not only the 
instantaneous throughput but also the chance of receiving more informative observation Ka = 1. 
Hence, getting immediate reward and gaining information for more rewarding future decisions 
are no longer conflicting here. 

Second, the separation principle decouples the design of the sensing strategy from that of the 
spectrum sensor and the access strategy. Furthermore, it reduces the design of the sensing strategy 
from a constrained POMDP Q to an unconstrained one with finite action space (fT4)) . This is 
because the sensor operating points and the transmission probabilities determined by (fT3l) have 
ensured the collision constraint regardless of channel selections. The optimal sensing policy 
is thus obtained by maximizing the overall throughput without any constraint. Unconstrained 
POMDPs have been well-studied. The optimal sensing policy can thus be readily obtained by 
using computationally efficient solution procedures in [5]-[8]. 

C. Interaction between the PHY and the MAC Layers 

Before solving for the optimal sensor operating and access policies, we study the interaction 
between the PHY layer spectrum sensor and the MAC layer access strategy. 

We note that when the spectrum sensor at the PHY layer is given, the separation principle 
still holds for the design of the sensing and access strategies. The optimal access strategy for a 
given spectrum sensor can thus be obtained. 

Proposition 2: Given a chosen channel a and a feasible sensor operating point (ca, 5a), the 
optimal transmission probabilities (/^(O), /*(!)) are given by 



(/:(o), /:(!)) 



(0,1), 5, = C, (15) 

Proof: The proof is based on the separation principle (fT3l) and the fact that all feasible 

operating points lie above the line 1 — 5a = ea- See details in Appendix C. rm 
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As seen from Proposition [2l randomized access policies are necessary to achieve optimality 
when Sa C- Moreover, Proposition [2] quantitatively characterizes the impact of the sensor 
performance 5a on the optimal access strategy (/*(0), /*(!)). As illustrated in Fig. IH the set 
As{a) of feasible sensor operating points can be partitioned into two regions: the "conservative" 
region (5a > ^^d the "aggressive" region (5a < 0- When 5a > C» with high probability, 
the spectrum sensor detects a busy channel as idle (i.e., a miss detection occurs). Hence, the 
access policy should be conservative to ensure that the collision probability is capped below (. 
Specifically, even when the sensing outcome 6^ = 1 indicates an idle channel, the secondary user 
should only transmit with probability f- < 1- When the channel is sensed as busy 6^ = 0, the 
user should always refrain from transmission. On the other hand, when 5a < the probability 
of false alarm is high; the spectrum sensor is likely to overlook an opportunity. Hence, the 
secondary user should adopt an aggressive access policy: always transmit when the channel is 
sensed as idle and transmit with probability > even when the sensing outcome indicates a 
busy channel. When 5a = the access policy is to simply trust the sensing outcome: ^a = ©a- 
We will show in Section IIV-DI that the splitting point (5a = C on the best ROC curve -P^'^lnax 
the optimal sensor operating point. 




ia < (■ aggressive 



<a > C'- conservative 



0.2 0.4 0.6 0.8 1 

Probability of False Alarm e 



Fig. 4. Illustration of conservative and aggressive regions. 
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Similar to Proposition [2l we can quantitatively study the impact of the access strategy on 
the spectrum sensor design by solving (fT3l) for the optimal sensor operating points when the 
transmission probabilities are given. This result is omitted to avoid unnecessary repetition. Details 
can be found in [27]. 

D. Optimal Joint Design of Spectrum Sensor and Access Policy 

Optimizing (fT5l) over all feasible sensor operating points, we obtain an explicit optimal design 
for the spectrum sensor and a closed-form deterministic optimal access policy in Theorem [2l 

Theorem 2: For any chosen channel a in any slot, the optimal sensor should adopt the optimal 
NP detector with constraint 6* = ( on the PM. Correspondingly, the optimal access policy is to 
trust the sensing outcome given by the spectrum sensor, i.e., /^(O) = and /^(l) = 1. 

Proof: The proof of Theorem |2] exploits the convexity of the set of feasible sensor 
operating points, which follows directly from the concavity of the best ROC curve [25]. See 
Appendix D for details. I I I I 

We find that the optimal sensor operating point coincides with the splitting point = ( of 
the "conservative" region and the "aggressive" region on the best ROC curve (see Fig. HJ. This 
indicates that at 5* = the best tradeoff between false alarm and miss detection is achieved 
and the access policy does not need to be conservative or aggressive. We thus have a simple and 
deterministic optimal access policy: trust the sensing outcome = G^, i.e., access if and only 
if the channel is sensed to be available. Summarized below are the properties of the optimal 
sensor operating and access policies given in Theorem [2l 

Properties 1: The optimal spectrum sensor design and the optimal access policy are 

Pl.l time -invariant and belief -independent. 

PI. 2 model-independent. 

As a result of Pl.l, the spectrum sensor can be configured off-line, and there is no need to 
calculate and store the optimal transmission probabilities, leading to significant reduction in both 
implementation complexity and memory requirement. The second property is that the optimal 
design of the spectrum sensor and the access strategy does not require the knowledge of the 
transition probabilities of the underlying Markov process. Since the probability of collision ([T2|) 
is solely determined by the sensor operating and access policies, PI. 2 indicates that the collision 
constraint on the joint OSA design can be ensured regardless of the accuracy of the Markovian 
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model used by the secondary user. In other words, the primary network is not affected by the 
inaccurate model adopted by the secondary user. Model mismatch only affects the performance 
of the secondary user (see Fig. [8] for a simulation example). 

E. Optimal Sensing Policy 

As revealed by the separation principle, the optimal sensing policy can be obtained by solving 
an unconstrained POMDP with finite action space A^. Specifically, by applying the optimal 
spectrum sensor design and the optimal access policy given in Theorem [2] to Q, we simplify 
the optimality equation as 



V;(A(t)) =maxJ2Yl W^s'.s Yl U,,k{a)[kBa + Vt+i{T{X{t) \ a, A;))], 1 < t < T, 

sGS s'gS fc=0 

(16a) 



sGS s'gS 

By applying /*(0) = and /*(1) = 1 to (flOl) . we obtain the conditional observation probability 

Us,i{a) as 



where e* is the PFA associated with the PD 1 — 5* = 1 — C on the best ROC curve Pn i,» ■ 
The updated belief vector T{K{t) \a,k) can be obtained by substituting Us,k{A) in (fTT)) with 

It is shown in [5] that the value function of an unconstrained POMDP with finite action space 
is piece-wise linear and can be solved via linear programming. We can thus use the existing 
computationally efficient algorithms [6]-[8] to solve dH) for the optimal sensing policy. 

Although myopic sensor operating and access policies are shown to be optimal for the joint 
design of OSA (see the separation principle), myopic sensing policy is suboptimal in general. 
Interestingly, it has been shown in [26] that the myopic sensing policy is optimal when the SOS 
evolves independently and identically across channels. When the channel occupancy states are 
correlated, the myopic approach can serve as a suboptimal solution with reduced complexity. 

F. Simulation Examples 

Here we provide simulation examples to study different factors that affect the optimal joint 
design of OSA. We consider iV = 3 channels, each with bandwidth Bn = I. While the 




(16b) 



UsAa) = Sail - el), f/s,o(a) = 1 - f/s,i(a) 



(17) 
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separation principle applies to arbitrarily correlated SOS, we consider here the case where the 
SOS evolves independently but not identically across these three channels for simplicity. As 
illustrated in Fig. [51 the SOS dynamics are given by the transition probabilities a = [ai, 0^2, as] 
and /3 = /32, (3^, where denotes the probability that channel n transits from state (busy) 
to state 1 (idle), and /3„ denotes the probability that channel n stays in state 1. In all figures, 
the transition probabilities are given by ck = [0.2,0.4,0.6] and /3 = [0.8,0.6,0.4]. We assume 
that they remain unchanged in T = 10 slots. The maximum allowable probability of collision is 
C = 0.05. We use the normalized overall throughput Vi{K{\)) /T, where A(l) is the stationary 
distribution of the SOS, to evaluate the performance of the optimal OSA design. 




1-A 

Fig. 5. The Markov channel model. 



To illustrate the interaction between the PHY layer spectrum sensor and the MAC layer access 
policy, we consider a simple spectrum sensing scenario where the background noise and the 
primary signal are modeled as white Gaussian processes. Let ct^q and a,^ ^ denote, respectively, 
the noise and the primary signal power in channel n. At the beginning of each slot, the spectrum 
sensor takes M independent measurements Y„ = [F„,i, . . . , Yn^u] from chosen channel n and 
performs the following binary hypothesis test: 

no{Sn = l): Y„~Ar(OM,a2olM), 

(18) 

vs. ni{Sn = Q): Y„~Ar(OM,«i + <o)lAf), 
where A/'(Om, o"^Ia/) denotes the M-dimensional Gaussian distribution with identical mean 
and variance in each dimension. An energy detector is optimal under the NP criterion [25, 
Sec. 2.6.2]: 

M 

||Y„||2 = 5^F^,^^;r^„. (19) 

i=l 

The PFA and the PM of the energy detector are given by [25, Sec. 2.6.2]: 

(M _rh_\ , fM r,„ \ 
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where 7(m, a) = f^'^e'^ dt is the incomplete gamma function. The optimal decision 

threshold r^* of the energy detector is chosen so that 5* = ^. Unless otherwise mentioned, we 
assume that M = 10, g = "^0 = ^'^^ ^n,i = = 5 dB for all channels n = 1, . . . , A^. 




0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 

Probability oif IVIiss Detection 5 

5 [ e t (aggressive) \ 5 ] e |(conservative) 
: ^ 



Fig. 6. The impact of sensor operating characteristics on the performance of the optimal OSA design. 

1) Impact of Sensor Operating Characteristics: Fig. [6] shows the impact of sensor operating 
characteristics on the secondary user's throughput and the optimal access policy. The upper figure 
plots the maximum throughput Vi{K{l)) /T vs. the PM 5. The optimal transmission probabilities 
(/*(0), /*(!)) are shown in the middle and the lower figures, respectively. We can see that the 
maximum throughput is achieved at 5* = C = 0.05 and the transmission probabilities change 
with 5 as given by Theorem [2l Interestingly, the throughput curve is concave with respect to 
5 in the "aggressive" region {5 < and convex in the "conservative" region {5 > Q. The 
performance thus decays at a faster rate when the sensor operating point drifts toward the 
"conservative" region. This suggests that miss detections are more harmful to the OSA design 
than false alarms. 
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Fig. 7. The impact of the number of channel measurements on the performance of the optimal OSA design. 

2) Impact of the Number of Channel Measurements: In this example, we study the tradeoff 
between the spectrum sensing time, which determines on the number M of channel measurements 
taken by the spectrum sensor, and the transmission time. Taking more channel measurements 
can improve the fidelity of the sensing outcome but will reduce the data transmission time and 
hence the number of transmitted bits. We are thus motivated to study the throughput of the 
secondary user as a function of M for different maximum allowable probabilities of collision ^. 
We assume that each channel measurement takes c = 5% of a slot time. The transmission time 
is thus given by 1 — Mc = 1 — 0.05M. Assuming that the number of bits that can be transmitted 
by the secondary user is proportional to both the channel bandwidth and the transmission time, 
we modify the immediate reward © of the POMDP to i?;^^ = (1 - Mc)KaBa. 

Fig. |7] shows that the throughput of the secondary user increases and then decreases with 
the number M of channel measurements. Note that the PM is a function of the number M of 
channel measurements and the detection threshold ?7* of the energy detector (as seen from (|20|)). 
When the PM is fixed to be 5* = C according to the separation principle, the detection threshold 
r]* increases with M, and hence the PFA e* decreases with M. As a consequence, when M 
is small, the throughput of the secondary user is limited by the large PFA. On the other hand. 
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when M is large, the PFA is reduced at the expense of less transmission time in each slot, 
which also leads to low throughput. We also observe that the optimal number M* of channel 
measurements at which the throughput is maximized decreases with the maximum allowable 
collision probability C,. The reason behind this observation is that the PM 5* increases with C, 
and hence less measurements are required to achieve the same PFA (as seen from (|20l) ). 
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Fig. 8. The impact of mismatclied Markov model on thie performance of the optimal OSA strategy. 



3) Impact of Mismatched Markov Model: We have assumed that the secondary user has 
perfect knowledge of the transition probabilities of the underlying Markov model. The transition 
probabilities learned by the secondary user, however, may have errors. Suppose that the true 
transition probabilities are given by ex. and (3. The secondary user employs the optimal OSA 
design based on inaccurate transition probabilities ol' and (3' . In the upper half of Fig. [8l we 
plot the relative throughput loss as a function of the relative estimation error \E' in transition 
probabilities, where ^ = x 100% = x 100%. Note that when ^ = 0, the secondary 

user has perfect knowledge of the transition probabilities and hence achieves the maximum 
throughput. Inaccurate knowledge can cause performance loss. We observe that the relative 
throughput loss is below 4% even when the relative error is up to 20%. In the lower figure, we 
examine the probability of collision perceived by the primary network. We see that the probability 
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of collision is not affected by inaccurate transition probabilities, which confirms PI. 2. 

V. OSA WITH Multi-Channel Sensing 

In this section, we address the joint design of OSA in the case where multiple channels can 
be sensed and accessed simultaneously in each slot (L > 1). We focus on the extension of the 
separation principle developed in Section |IVl 

A. Optimal Joint Design 

Within the POMDP framework presented in Section Unl we first describe the three basic 
components of OSA with multi-channel sensing and then derive the optimality equation. 

1) Spectrum Sensor: Suppose that a set A{t) C {1, . . . , A^} of channels is chosen in slot t, 
where \A{t)\ = L > 1. The spectrum sensor performs a 2^-ary hypothesis test: 

Ho: S^(t) = [l, !,...,!], 

Hi: S^(t) = [0, !,...,!], 

(21) 

H2L_i: S^(t) = [0,0,...,0], 

where Sj[{t) = {Sn(t)}neA{t) ^ {0,1}^ denotes the occupancy states of the chosen channels 
A{t) in the current slot. The a priori probabilities of these hypotheses can be learned from the 
observation and decision history, which is characterized by the belief vector. For example, given 
current belief vector A{t) and chosen channels A{t), the a priori probability of Ho in this slot 
is given by 

Pr{7^o} = $^$^As.(t)Ps',s n Ms.=i]- (22) 

se§ s'es neA{t) 

This indicates that how sensor and access information at the MAC layer can be used in the 
design of the spectrum sensor at the PHY layer. 

Let @A(t) = {^n(t)}n(=A(t) £ {0,1}^ denote the sensing outcomes. Sensing errors occur if 
the spectrum sensor mistakes one hypothesis for another, i.e., @Ait) S^(t). Since there are 
total 2^ hypotheses, the performance of the spectrum sensor can be specified by a set £{t) of 
2^(2^ - 1) error probabilities: 

S{t) = {Pr{detect Hi \ Hj is true} : < z, j < 2^ - 1, z ^ j}. (23) 
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The optimal design of the spectrum sensor should achieve a tradeoff among these 2^(2^ — 1) error 
probabilities. Let A^^^(^) include all sets of achievable error probabilities. A sensor operating 
policy specifies, in each slot t, a feasible sensor operating point {i.e., a set of achievable error 
probabilities) S{t) G A^f^\A{t)) based on the current belief vector A{t) and the chosen channels 
Ait). 

2) Sensing and Access Policies: At the beginning of each slot t, a sensing policy specifies a 
set A{t) G aI^^ = {A C {1, . . . , A^}, 1^1 = L} of channels to be sensed based on the current 
belief vector A{t) E n(S). Based on A(t) and the imperfect sensing outcomes ©^(t) given 
by the spectrum sensor, an access policy decides whether to access ^A(t) = {^nit)}neA{t) ^ 
{0, 1}^. At the end of slot t, the receiver acknowledges every successful transmission. The 
acknowledgments (i.e., the common observation of the transmitter and the receiver) are denoted 
by KAit) = {Knit)}neA(t) e {0,1}^, where Knit) = 5„(t)<l>„(t). Given observations K^(t) 
and sensing action A{t), the secondary user obtains an immediate reward i?K^(t): 

i?K^W =5Zi^n(t)5„. (24) 

n€A 

3) Optimality Equation: In a similar fashion as Section |nil we can formulate the optimal 
design of OSA with multi-channel sensing as a constrained POMDP. We can also show that 
Proposition [T] holds, i.e., it is sufficient to consider deterministic sensor operating and sensing 
policies for the optimal design of OSA with multi-channel sensing. Therefore, in each slot, the 
secondary user needs to make the following decisions: which set A G of channels to sense, 
which sensor operating point £ G A^^''(^) to choose, and which set jF={/„(0)} neA of 

6»G{0,1}^ 

transmission probabilities to use, where 

UO) = Pr{$„ = 1 1 = 0} G [0, 1] 

is the probability of accessing chosen channel n given belief vector and sensing outcome = 0. 
The composite action space is denoted by 

A(^) = :^G Ai^U G Af)(^),^G [0,1]^'^}. 
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We can obtain the optimality equation and the design constraint as 

X [i?k^ + \4+i(T(A(t) I A, k^))] , 1 < t < T, (25a) 
Vt{K{T))= max E E (25b) 

s.t. P„(t) = J] h^AsA^A I 0) /0^|s^(^^ I ^a) Wa) < C, Vn,t, (25c) 
6>^,s^e{0,i}^ 

where hs_^\s„ {^a I = Pi'jS^ = | Sn = i} is the conditional distribution of channel occupancy 
states given current belief vector A(t), /0^|s^(^^ I s^) = Pi'lQyi = I = s^} is 
the error probability determined by the current sensor operating point £, and the conditional 
distribution U^^^{A) of observations can be calculated as 

f/iJ_^(A)^Pr{K^ = k^|S = s} 

= E ^QAls^i^A I s^) n P^{^" = I 0^ = Oa, Sa = sa} 

0.Ae{o,i}^ neA (26) 

= E ^0^18.^(^-4 I Sa) n [knSnfniOA) + (1 " ' 
6»^g{0,l}^ "G-^ 

The updated belief vector T(A(t) | A, k^) can be obtained by substituting (|26l ) into (fTTI) . 

In principle, the optimal decisions {^*,£^*,JF*} in each slot can be obtained by solving 
(|25] ) recursively. However, without any structural results on this constrained POMDP, (|25] ) is 
computationally prohibitive. A natural question here is whether there exists a separation principle 
similar to Theorem [T] that can be used to simplify the optimal design of OSA with multi-channel 
sensing. 

B. Separation Principle 

We show that under certain conditions, the separation principle established for the single- 
channel sensing case can be applied in the multi-channel sensing scenarios. 

Theorem 3: When the spectrum sensor and the access policy are designed independently 
across channels, the separation principle developed in Theorem [7] is valid for optimal OSA 
design with multi-channel sensing. In this case, the optimal spectrum sensor adopts the optimal 
NP detector with PM equal to (, which detects the occupancy of a chosen channel by using the 
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measurements from this channel, and the optimal access decision on a chosen channel is to trust 
the sensing outcome from this channel. The optimal sensing policy can be obtained by solving 
an unconstrained POMDP. 

Proof: The proof is built upon that of Theorem [T] See Appendix E. I I I I 

We emphasize that the extension of the separation principle to multi-channel sensing scenarios 
is based on the condition that the spectrum sensor and the access policy are designed indepen- 
dently across channels. Specifically, we assume that the occupancy of a channel is detected 
independently of the measurements taken from other channels and the access decision on a 
channel is made independently of the sensing outcomes from other channels. Intuitively, in this 
case, the design of spectrum sensor and access policy for the multi-channel L > \ sensing 
case can be treated as L independent design problems, one for each chosen channel. Hence, the 
optimal design for the single-channel case can be extended to L > 1. 

Theorem |3] provides sufficient conditions under which the design given by the separation 
principle (referred to as the SP approach for simplicity) is optimal. In Proposition [3l we show 
that the SP approach is locally optimal {i.e., maximizes the instantaneous throughput) under 
certain relaxed conditions. 

Proposition 3: Suppose that the spectrum sensor is designed independently across channels 
while the access policy jointly exploits the sensing outcomes from all channels. The SP approach 
is locally optimal when channels evolve independently. 

Proof: See Appendix F. rm 

It may sound plausible that the SP approach is (globally) optimal when channels evolve 
independently since in this case the sensing outcomes are independent across channels and 
independent access decisions seem to suffice. Interestingly, counter examples can be constructed 
to show that introducing correlation among access decisions across channels can improve the 
overall throughput. The rationale behind this is that the joint access design enables the secondary 
user to trade the immediate access to "bad" channels (e.g., channels with small bandwidth) for 
information on the occupancy states of "good" channels, leading to potentially more rewarding 
future decisions. Specifically, as noted in Section IIV-B[ the secondary user cannot distinguish 
a busy channel Sn = from the decision of no access = when observing Kn = 0. 
However, if the access decision on channel m ^ n is correlated with $„, then we can infer 
the occupancy state of channel n from both Km and Kn. That is, by sacrificing the immediate 
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access to channel m with small bandwidth, we can obtain more information on the occupancy 
state of channel n. 

C. Heuristic Approaches to Exploiting Channel Correlation 

While simplifying the design of OSA with multi-channel sensing, the condition that the 
spectrum sensor and the access policy are designed independently across channels can cause 
throughput degradation since the correlation among channel occupancies is ignored. We propose 
two heuristic approaches to exploit the channel correlation: one at the PHY layer and the other 
at the MAC layer. 

1 ) Exploiting Channel Correlation at the PHY Layer: When the occupancy states are cor- 
related across channels, we have correlated channel measurements at the PHY layer. Hence, 
the measurements at all chosen channels should be jointly exploited in spectrum opportunity 
identification. With this in mind, we propose a heuristic design of the spectrum sensor: it performs 
L binary hypothesis tests, one for each chosen channel, by using all channel measurements 
and adopting the optimal NP detector with PM equal to We point out that, different from 
the SP sensor, the proposed spectrum sensor performs L composite hypothesis tests since it 
uses all channel measurements and the occupancy states of other channels are unknown in 
each hypothesis test. Hence, the structure of the optimal NP detector adopted by this heuristic 
sensor relies on the joint distribution of the channel occupancy states, which is given by the 
belief vector (see Section IV-DI for an example). That is, the spectrum sensor design is affected 
by the observation and decision history and thus varies with time. As illustrated in Fig. |9l 
the performance of this spectrum sensor improves over time, resulting from more informative 
distribution of the SOS obtained from accumulating observations. Note that the design of this 
spectrum sensor is much simpler than the 2^-ary hypothesis test given in (|2TI) . 

Based on the sensing outcomes given by this sensor that exploits measurements from all 
chosen channels, access decisions are made independently across channels, i.e., access if and 
only if a channel is sensed as idle. We refer this approach as the PHY layer approach. 

Proposition 4: Suppose that the access policy is designed independently across channels while 
the spectrum sensor jointly exploits the measurements taken from all chosen channels. The PHY 
layer approach is locally optimal. When channels evolve independently, the PHY layer approach 
reduces to the SP approach. 

Proof: See Appendix G. rm 
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Note that the PHY layer approach is locally optimal even when channels are correlated. 

2) Exploiting Channel Correlation at the MAC Layer: When channel occupancies are corre- 
lated, so are the sensing outcomes given by the spectrum sensor. Hence, the channel correlation 
can also be exploited at the MAC layer by making access decisions jointly across channels. A 
heuristic MAC layer approach is to adopt the spectrum sensor of the SP approach, i.e., detects 
the occupancy state of a channel by using only the measurements of this channel, and then 
choose the access policy that exploits sensing outcomes from all chosen channels to maximize 
the instantaneous throughput. Specifically, for given chosen channels A E Ks and belief vector 

A(i) in slot t, we choose transmission probabilities JF = neA G [0, 1]^^^ as follows 

e^ejo,!}^ 

J" = arg max E [R-k^ \ A{t)] (27a) 
= arg max 5„ PT{Kn = 1} arg max 'S^ Prl^^S^ = 1} 

= arg max ^ ^5„Pr{5„ = 1} ^ /is^|5„(s^ 1 1) ^0^|s^(^^ I s^) /n(0^) (27b) 

■5^G[o,i]^2 6>^,s^e{o,i}^ 

s.t. Pn{t)= hs^\sA^A\O)^&A\sA0A\^A) Wa) <C, yneA, (27c) 

6i^,s^e{0,i}^ 

where the conditional probability /is^|5„(s^ \i) (i = 0, 1) of the current channel occupancies 
and the sensing error probability /0^|s^(^.4 I ^a) defined below (l25l) . 

The access policy given in (l27l) can be obtained via linear programming. Proposition [5] 
shows that this MAC layer approach is equivalent to the SP approach when the SOS evolves 
independently across channels. This agrees with our intuition that when channels are independent, 
so are the sensing outcomes from the chosen channels. Hence, independent access decisions 
perform as well as the joint one in terms of instantaneous throughput. 

Proposition 5: Suppose that the spectrum sensor is designed independently across channels 
while the access policy jointly exploits the sensing outcomes from all chosen channels. When 
channels evolve independently, the MAC layer approach reduces to the SP approach and hence 
is locally optimal. 

Proof: See Appendix F. rm 
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D. Simulation Examples 

Next, we study the performance of the SP, the PHY layer, and the MAC layer approaches. Note 
that these three approaches differ in the spectrum sensor and the access policy. We can employ 
any sensing policy to compare their performance. For simplicity, we consider a myopic sensing 
policy that chooses the set A of channels to maximize the expected instantaneous throughput 
under perfect sensing: i.e., for given belief vector A(t) in slot t, 

A = argmax V 5„ Pr{S„ = 1}. (28) 
AeAs ^-^ 

We adopt the model of Gaussian noise and Gaussian primary signal described in Section ITV-FI 
In this case, the spectrum sensor of the SP approach employs an energy detector given in (fT9l) . 
The detection threshold //„ of the energy detector is chosen so that the PM is fixed at ^. 

Using the measurements {Yn]neA from all chosen channels, the sensor employed by the PHY 
layer approach performs a composite hypothesis test for each chosen channel n: 

HoiSn = 1) : Y„ ~ ^^{0M, alM, 

Y„ ~ A/'(OAf, (a^,o + l[5™=o]f^^,i)lAf), Vm G A\{n} 

(29) 

n,iS„. = 0) : Y„ ~ Ar(OM, (al, + <o)Im), 

Y„ ~ AfiOM, (a^^o + 1[5™=0]<i)Ia/), Vm G A\{n}. 
Note that the distribution of the measurements under each hypothesis depends on the distribution 
of the current channel occupancy states = {Sn}neA^ which is given by hs\s,X^A I (defined 
below (l25l) ) and can be calculated from the current belief vector A(t). In this case, the optimal 
NP detector for (|29l) is given by a likelihood ratio test [25, Sec. 2.5]: 

Es^G{0,l}i ^S^|S„(S^ I 1) UmGAPC^ra\Sm = 

where /is|s„(Syt | 0) = when s„ ^ and p(Yn\Sn = s„) is the PDF of independent Gaussian 
channel measurements Y„: 

p{Yn\Sn = Sn) = l[ ^=6 '^<0+H^„^ori.^\ (31) 

i=l ^27r((T2^o + l[,„=o]<i) 

Note that when channel occupancies are independent, the above sensor employed by the PHY 
layer approach is equivalent to that of the SP approach, which demonstrates Proposition HI The 
PFA and the PM of this sensor can be evaluated via simulation. In each slot, the detection 
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threshold x„ is chosen according to the belief vector so that the resulting PM is fixed at C, i.e., 
the design of the spectrum sensor varies with time. 

As proven in Propositions [3] -[5l the PHY layer and the MAC layer approaches are equivalent to 
the SP approach when channels evolve independently. We thus compare below the performance 
of these three approaches in correlated channels. Specifically, we consider = 4 correlated 
channels, each with bandwidth i?„ = 1. The transition probabilities of the SOS are given by 

-P[0000],[0111] = 0.6, -P[0000],[0000] = 0.4, -P[0111],[0000] = -P[1011],[0000] = -P[1101],[0000] = -P[1110],[0000] = 

0.2, and P[oiii],[ioii] = ^[ioii],[iioi] = ^[iioi],[iiio] = ^[iiio],[oiii] = 0.8. The maximum allowable 
probability of collision is assumed to be = 0.05. In each slot, L = 3 channels are chosen. 
The spectrum sensor takes M = 1 measurement at each chosen channel, and the noise and the 
primary signal powers are given by q = dB and ^ = 10 dB for all n. 




Fig. 9. Comparison of ROC curves. 



1) Comparison of Sensor Performance: In Fig. |9l we plot the ROC curves (1 — 5„ vs. e„) 
of the SP sensor and the sensor employed by the PHY layer approach. Note that the sensor 
employed by the MAC layer approach is the same as the SP sensor. We see that the sensor of 
the PHY approach outperforms that of the SP sensor. Specifically, for a fixed PM, the PFA of the 
sensor employed by the PHY approach is smaller than that of the SP sensor. This is because the 
sensor of the PHY approach exploits the correlation among channel measurements in detection 
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while the SP sensor uses measurements from a single channel. We also observe that the ROC 
curve of the sensor of the PHY approach improves over time while that of the SP sensor remains 
the same. This observation can be explained by comparing the optimal detectors (fT9l) and (|30l) . 
Clearly, the energy detector (fT9l) used by the SP sensor is static and so is its performance. As seen 
from (l30l) . the decision variable of the sensor of the PHY approach depends on the conditional 
distribution /is^|5„(s^ \ i) of the channel occupancies, which varies with time according to the 
belief vector. As time t increases, the belief vector provides more information on the SOS due 
to the accumulating observations, leading to improved sensor performance. Fig. [9] demonstrates 
that the performance of the spectrum sensor can be improved by incorporating the sensing and 
access decisions at the MAC layer, which are encoded in the belief vector. 




SlotT 



Fig. 10. Comparison of normalized throughput (bit units per slot). 

2) Comparison of Throughput Performance: In Fig. [lOl we compare the throughput of these 
three approaches. As expected, the SP approach, which ignores the channel correlation, performs 
the worst. By jointly exploiting the sensing outcomes in access decision-making, the MAC layer 
approach can improve throughput performance. A much larger performance gain is achieved by 
the PHY layer approach which jointly exploits the channel measurements in spectrum opportunity 
identification. We can thus see that exploiting channel correlation at the PHY layer is more 
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effective than that at the MAC layer. In other words, independent opportunity identification 
at the PHY layer hurts the throughput more than independent access decision-making at the 
MAC layer. This agrees with our intuition because independent opportunity identification makes 
hard decisions on whether the channel is idle. The correlation among the resulting sensing 
outcomes is less informative than that in the original channel measurements, leading to throughput 
degradation. 



Unique challenges in the design of OSA networks arise from the tension between the secondary 
users' desire for performance and the primary users' need for protection. Such tension dictates the 
interaction between opportunity identification at the physical layer and opportunity exploitation 
at the MAC layer, and a cross-layer approach is necessary to achieve optimality. 

In this paper, we have developed a POMDP framework that captures basic components and 
design tradeoffs in OSA. We have shown that, surprisingly, there exists a separation principle 
in the optimal joint design of OSA that circumvents the curse of dimensionality in general 
POMDPs. Being able to obtain the optimal joint design in closed-form allows us to characterize 
quantitatively the interaction between the physical and MAC layers. In particular, we have demon- 
strated how sensing errors at the PHY layer affect MAC design and how incorporating MAC 
layer information into physical layer leads to a cognitive spectrum sensor whose performance 
improves over time by learning from accumulating observations. 



We first prove the existence of a deterministic optimal sensor operating policy. Suppose that 
channel n is chosen in the current slot. Let uj : As{n) — > [0, 1] be an arbitrary PDF on the set 



VI. Conclusion 



Appendix A: Proof of Proposition \T} 



As{n) of feasible sensor operating points, i.e., 
resulting PFA and the PD 1 — 5„ as 



(t,5)eAs(n) 



u{e, 5)ded5 = 1. We can compute the 




(32b) 



(32a) 



Since < e < 1 - 5 < P, 



(e) for every sensor operating point in As{n), we have 




(33) 
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Since the best ROC curve Pj^\^^^ is concave, we have E[P^"^3^^(e)] < Po laa-J^V]) hence 
< e„ < 1 — 5„ < -Pi)"max(^n)- ^hat is, the resulting PFA and PM (e„,5„) of any randomized 
sensor operating policy uj belongs to the set Ks{n). Therefore, it is sufficient to consider 
deterministic sensor operating policies. 

The spectrum sensor and the access policy should ensure that the collision constraint is satisfied 
no matter which channel is chosen. Let f„ denote the maximum expected remaining reward 
when channel n is chosen in the current slot. Then, the deterministic sensing policy that chooses 
channel n* = arg max„gA^ Vn in this slot is optimal since the maximum expected remaining 
reward that can be achieved by a randomized sensing policy is J2neeAs ^nfJ'{n) < Vn*, where 
jj,: As ^ [0, 1] is a PMF on the set A^. 

Appendix B: Proof of Theorem □ 

The proof of the separation principle is built upon the following three Lemmas. For ease 
of presentation, we define Qt{A\ A) as the maximum expected remaining reward that can 
be obtained starting from slot t given that the current belief vector is A and action A = 

{a, (ea, 5a), (/a(0), /a(l))} G A is taken in this slot, i.e., 

1 

Qt{A\A) = Y,Y. ^^'^^''^ ^«.'^(^) [^^- + Vt+i{r{A I A, k))] . (34) 
ses s'es fc=o 

Let A^{a,{ea,5a),{fa{0)Ja{m e A and A' ^ {a, (e^, <5^), (/^(O), /^(l))} G A be two ac- 
tions with the same channel selection but different sensor operating points and transmission 
probabilities. 

Lemma 1: The value function given in ^ is convex in the belief vector. Specifically, at any time 
t, the value functions \4(Ai) and Vt(A2) of any two belief vectors Ai G n(§) and A2 G n(§) 
satisfy 

\/t(rAi + (l-r)A2) <rrt(Ai) + (l-r)Vt(A2), where Q<t<1. (35) 
Proof: We use mathematical induction. From the value function given in (|9bl) . we can see 
that Vt(A) in the last slot t = T is linear and hence convex in the belief vector A. Suppose 
that Vf(A) is convex for every slot t > to. By the definition of convex functions, we can show 
that the maximum remaining reward Qt{A\A) under an action A G A is convex. Since the 
maximum of a set of convex functions is convex, the value function ^^(A) in slot t = to is 
convex and Lemma [U follows. rm 
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Lemma 2: If acknowledgement Ka = 1 is observed in a slot t, then the future reward, given 
by the value function Vt+i(T(A | A, 1)), is independent of the sensor operating point (ea,(5a) 
and the transmission probabilities (/a(0), /«(!)) employed in the current slot. That is, 

Vt+,{T{X I A 1)) = yt+i(r(A I A', 1)). (36) 
Proof: Applying the conditional observation probability Us,i{A) given in (fTOl) to (fTT)) . we 
obtain the updated belief vector A^{t + 1) = T(A | A, 1) whose element \\{t + 1) is given by 

which is independent of the sensor operating point (ea, 5a) and the transmission probabilities 

(/a(0),/a(l)). m 

Lemma 3: In any slot t, the future rewards VJ+i(T(A \A,k)) and Vf+i(T(A | A', k)) satisfy 
the following inequality: 

Vt+i{T{K I A 0)) < TVt+i{T{K I A 1)) + (1 - r)Ft+i(T(A | A, 0)), (38) 
where r ^/ven 

EseS Es-gS AsK^)^s^s [^s.o(A) - [/.,o(A)] ^^^^ 
Eses Es'es -^s'(^)^s',sf^s,o(^) 
Proof: Applying the conditional observation probability f/s,fe(^) given in (flOl) to (fTTI) . we 

can obtain the updated belief vectors T(A | A, fc) and T{X \ A' ,k). After some algebras, we 

reach the following equality: 

T(A I A, 0) = rT(A | A, 1) + (1 - r)T(A | A, 0), (40) 

where r is given by (l39l) . Lemma [3] follows from the convexity of the value function proven in 
Lemma [H rm 
With the above three Lemmas, we now prove the separation principle. First notice that the 
expected immediate reward E[_Rxa(t) I ■^('^)] can be obtained as 

E[i?x.w I A(t)] = 5„ ^ ^ K'{t)P,',sUs,M) 

ses s'es 

= [ea/a(0) + (l-ej/a(l)]5, J] J]A,,(t)Ps',sSa. (41) 

Since -Bq Eses Es'es ^ constant for given belief vector A(t) and sensing action 
a, the expected immediate reward ¥\RKa{t) I -^(^)] increases with tafa{^) + (1 — ea)/a(l)- 
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Second, we note that the sensor operating point {ea, Sa) and the transmission probabilities 
(/a(0), /a(l)) only affect the expected remaining reward Qt{A(t) \ A) defined in (l34l) through 
the observation probability f/s,i(a, 5, /(O), /(I)) = Sa[eafa{0) + (1 - ea)/a(l)]- Therefore, if we 
can show that Qt{\{t) \ A) increases with the quantity eafa{0) + (1 — ea)/a(l), then this will 
prove the separation principle. 

To this end, we consider two actions A and A' such that e^/^(0) + (l-e^)/^(l) > ea/a(0) + (l- 
ea)/a(l) in slot t. Comparing the resulting maximum expected remaining rewards Qt(A{t) \ A') 
and Qt{A{t) \ A), we obtain that 

Qt{Ait)\A')-QtiAit)\A) 
= Y.Y1 ^s'(i)^s',s {Ba [Us,M') - UsAA)] 
1 

X [Us,k{A')Vt+i{T{A{t) I A', k)) - U,,,{A)Vt^,{T{A{t) \ A, k))]} 

k=0 
1 

>Y.Y1 ^At)Ps',s Yl [Us,k{A')Vt+,{r{A{t) I A', k)) - f/,,,(A)\/i+i(T(A(t) I A, k))] (42) 
se§ s'es fc=o 
Applying Lemmas [2] and [3l we obtain after some algebras: 

Qt{A{t)\A')-Qt{A{t)\A)>Q, (43) 

which proves the monotonicity of the expected remaining reward Qt{A{t) \ A) with ea/a(0) + 
(1 — ea)/a(l) and hence completes the proof of the separation principle. 

Appendix C: Proof of Proposition [2] 

When (5a = 1, we have = and the objective function ea/a(0) + (1 — Cq )/„(!) given in (I13al) 
is maximized when /*(1) = 1. When (5a G [0, 1), the constraint given in (fT3l) can be written as 

< /a(0) < ^M^. (44) 



Applying (l44l) to the objective function in (|13al) . we obtain that 

ea/a(0) + (l-ea)/a(l)</a(l 



1 



l~6a 



(45) 



1-6. 



where the equality holds when /a(0) = Since 1 — (5a > (see footnote 2), the right 

hand side of (l45l) increases with /a(l). Hence, to maximize the objective function ea/a(0) + 
(1 - ea)/a(l), we should choose the largest /a(l) such that /a(0) = > (see (|44l)). 

Therefore, when 5a < C, /a*(l) = 1 and correspondingly /,*(0) = ^j^^. When 5a > C /^(l) = £ 
and correspondingly /*(0) =0. 
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Appendix D: Proof of Theorem [2] 

Applying the optimal transmission probabilities (/„ (0), (1)) given in Proposition [2] to the 
objective function (|13a|) . we obtain that 

ea/.(0) + (1 - e„)/„(l) = (46) 

Since the best ROC curve is concave [25, Sec. 2.2], both and increase with ta and 
hence decrease with 5a- From (l46l) . we can see that the objective function ea/a(0) + (1 — ea)/a(l) 
increases with 5a when 5a < C' but decreases when 5a > C- Hence, the maximum is achieved 
when (5* = C- Correspondingly, the optimal transmission probabilities (/^(O), /*(!)) are given 
by (0,1). 

Appendix E: Proof of Theorem [3] 

Let A(^) = M,{(e„,5„)W,{(/n(0),/„(l))}„e^} and A„ ^ {n, (e„, (/n(0), /„(!))} G 
A, where An corresponds to the actions taken on chosen channel n E A. When the spectrum 
sensor is designed independently across channels, we can write 1@j^\Sj({Ga\^a) = Pr{0^ = 
Ga\^a = ^a} = YlneA^^i^'^ ~ dn \ Sn = s„} iu a product form since the occupancy of 
a channel is detected independently of the measurements at other chosen channels. When the 
access policy is designed independently across channels, we have fn{G a) = /n(^n) for 
sensing outcomes 0^ E {0, 1}^. Therefore, we can write the conditional observation probability 

f/iJ,(A(^)) as m 

U^s^M^""^) = H = ^« I = ^n}[knSnfnien) + (1 - A;„)(l - S„/„(^n))] 

0Ae{o,i}^ neA 
1 

= 11 Yl P^^®- = I = ^n}[knSnfn{0n) + (1 " - sjn{0n))] 

neA en=o 

= n^«.'^"(^-)- (47) 

neA 

Similarly, after some algebras, the design constraint in (I25cl) can be written as 

1 

Pn{t) = Pr{0n = On\Sn = OjfM = (1 - 5n)/n(0) + 5n/„(l) < C, Vn G A. (48) 

e„=o 

Applying (l47l) to (l25l) . we can see that the sensor operating point (e„, 5„) and transmission 
probabilities (/ri(0), /„(1)) of a chosen channel n E A affect the maximum remaining reward 
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only through f/s,i(^n) = ■s„[e„/„(0) + (1 — e„)/„(l)], which is independent of the actions 
{Am]m&A\{n} taken on the other channels. Moreover, the simplified constraint (|48l) reveals that 
the collision probability of a channel n is also independent of the actions {Am}m<^A\{n} taken 
at other channels. Therefore, the design of the sensor operating and access policies can be 
decoupled across channels. Following the same proof as given in Appendix B, we can show that 
the expected remaining reward increases with e„/„(0) + (1 — e„)/„(l) of every chosen channel 
neA. 

On the other hand, the expected immediate reward E[i?K^(t) | is given by 

E[i?K^(,) I A(t)] =Y,Bn Pr{/^n = 1} = 5^ Br, Pr{5„ = l}K/.(0) + (1 - e„)/.(l)], (49) 

neA n^A 

which also increases with enfn{0) + (1 — en)/n(l)- Therefore, the separation principle developed 
in Theorem [T] holds for L > 1. 

Appendix F: Proof of Propositions [3] and [5] 
Let A G A^^^ denote a set of chosen channels and An = ^\{n} be all the set of cho- 
sen channels excluding n. Since channels evolve independently, we have hs^ \Sni^An I 0) = 
hs_ijSni^A„ 1 1)' where hs_^jSni^A„ I = Pr{S^„ = s_4,^ | Sn = i}. Hence, given belief vector 
A(t) and chosen channels A in slot t, the myopic (i.e., locally optimal) sensor operating point 
(e„,(5„) and transmission probabilities JF = {/„(0^)} are given by (l27l) 

{(e„, Sn),^} = arg max E [RKA(t) I M^)] 

1 

= arg max E„ Pr{S'„ = 1} Pr{0n = On \ Sn = l}gniOn) 

= arg max V fi„Pr{5„ = l}Mn(0) + (1 - e„)^„(l)] (50a) 

(eu,<5n)eA5 

1 

S.t. P„(t) = J2 = I 5n = O}(7„(0„) = (1 - 6n)gn{0) + 5„(7n(l) < C, Vn G ^ 

6»„=0 

(50b) 

where g{9n) G [0, 1] is defined as 

gniOn) = Yl fni0A^,On) Pr{S^„ = S^J J] Pr{0„ = 9^\Sn, = S^}. 

(51) 
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We see from (l50l) that the myopic approach should maximize en5'„(0) + (1 — e„)(7„(l) under the 
constraint (1 — 5n)5'n(0) + 5ngn{^) < C for every chosen channel n E A, leading to the same 
optimization problem as (fT3l) . By Theorem|2l 5„ = C, and (5f„(0), ^„(1)) = (0, 1) are the solution 
to (l50l) . That is, the SP sensor is locally optimal. Furthermore, since (5f„(0), = (0, 1) is 
achieved by choosing /„(0^^,6'„) = l[e„=i\ in (ISTI) . transmission probabilities fn{OA) = are 
locally optimal, which completes the proof of Proposition [3l 

Proposition [5] follows directly from the fact that the MAC layer approach employs the myopic 
access policy and the SP sensor, which has been proven to be locally optimal. 

Appendix G: Proof of Proposition |4] 

When the access policy is designed independently across channels, we have /„(^^) = /n(6'„) 
for any sensing outcome 0^ = 0j[ from chosen channels A. Hence, given belief vector 
A(t) and chosen channels A in slot t, the myopic spectrum sensor S and access decisions 
{(/n(0),/„(l))}„,e^ are given by 

{(/n(0), /n(l)) W} = arg max V Pr{5„ = l}[Pr{e„ = 1 1 = l}/n(l) 

^e^^^' neA 
/n(o),/„(i)e[o,i] 

+ Pr{e„ = 0|5„ = l}/„(0)] (52a) 
s.t. = Pr{e„ = 1 1 ^„ = 0}/„(l) + Pr{e„ = 0\Sn = 0}/„(0) < C, Vn G A, (52b) 

where 

Pr{e„ = en\Sn = Sn} = ^ ^^^i^Ar. = ^A^^^n = On \S_^^ = Sj^^, Sn = S„} 

(53) 

is determined by the sensor operating point S G Since (|52|) has the same form as (fT3l) . the 
PHY layer approach is locally optimal. 

Furthermore, when the SOS evolves independently across channels, the measurements from 
different channels are independent. Hence, the sensor employed by the PHY layer approach is 
equivalent to the SP sensor. 
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